Search for: All records

Creators/Authors contains: "Roy, Nirupam"

« Prev Next »

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Spatial Audio Processing with Large Language Model on Wearable Devices

Mishra, Ayushi; Bai, Yang; Narayanasamy, Priyadarshan; Garg, Nakul; Roy, Nirupam (July 2025, International Conference on Machine Learning (ICML))

Integrating spatial context into large language models (LLMs) has the potential to revolutionize human-computer interaction, particularly in wearable devices. In this work, we present a novel system architecture that incorporates spatial speech understanding into LLMs, enabling contextually aware and adaptive applications for wearable technologies. Our approach leverages microstructure-based spatial sensing to extract precise Direction of Arrival (DoA) information using a monaural microphone. To address the lack of existing dataset for microstructure-assisted speech recordings, we synthetically create a dataset called OmniTalk by using the LibriSpeech dataset. This spatial information is fused with linguistic embeddings from OpenAI’s Whisper model, allowing each modality to learn complementary contextual representations. The fused embeddings are aligned with the input space of LLaMA-3.2 3B model and fine-tuned with lightweight adaptation technique LoRA to optimize for on-device processing.
more » « less
Full Text Available
Large Network UWB Localization: Algorithms and Implementation

Garg, Nakul; Shahid, Irtaza; Sheshadri, Ramanujan; Sundaresan, Karthikeyan; Roy, Nirupam (April 2025, USENIX Symposium on Networked Systems Design and Implementation (NSDI))

Localization of networked nodes is an essential problem in emerging applications, including first-responder navigation, automated manufacturing lines, vehicular and drone navigation, asset tracking, Internet of Things, and 5G communication networks. In this paper, we present Locate3D, a novel system for peer-to-peer node localization and orientation estimation in large networks. Unlike traditional range-only methods, Locate3D introduces angle-of-arrival (AoA) data as an added network topology constraint. The system solves three key challenges: it uses angles to reduce the number of measurements required by 4X and jointly uses range and angle data for location estimation. We develop a spanning-tree approach for fast location updates, and to ensure the output graphs are rigid and uniquely realizable, even in occluded or weakly connected areas. Locate3D cuts down latency by up to 75% without compromising accuracy, surpassing standard range-only solutions. It has a 0.86 meter median localization error for building-scale multi-floor networks (32 nodes, 0 anchors) and 12.09 meters for large-scale networks (100,000 nodes, 15 anchors).
more » « less
Full Text Available
Demo: Scalable and Sustainable Asset Tracking with NextG Cellular Signals

https://doi.org/10.1145/3636534.3698837

Garg, Nakul; Ghosh, Aritrik; Roy, Nirupam (December 2024, ACM MobiCom)

This demonstration presents LiTEfoot, an ultra-low power localization system leveraging ambient cellular signals. To address the limitations of traditional GPS-based tracking systems in terms of power consumption and latency, LiTEfoot employs a non-linear transformation of the cellular spectrum to achieve efficient self-localization. Our design uses a simple envelope detector to realize spectrum folding, enabling the identification of multiple active base stations.
more » « less
Full Text Available
LiTEfoot: Ultra-low-power Localization using Ambient Cellular Signals

https://doi.org/10.1145/3666025.3699356

Garg, Nakul; Ghosh, Aritrik; Roy, Nirupam (November 2024, ACM)

In this paper, we introduce a low-power wide-area cellular localization system, called LiTEfoot. The core architecture of the radio carefully applies non-linear transform of the entire cellular spectrum to obtain a systematic superimposition of the synchronization signals at the baseband. The system develops methods to simultaneously identify all the base stations that are active at any cellular band from the transformed signal. The radio front end uses a simple envelop detector to realize the non-linear transformation. We build on this low-power radio to implement a self-localization system leveraging ambient 4G-LTE signals. We show that the core system can also be extended to other cellular technologies like 5G-NR and NB-IoT. The prototype achieves a median localization error of 22 meters in urban areas and 50 meters in rural areas. It can sense a 3GHz wideband LTE spectrum in 10ms using non-linear intermodulation while consuming 0.9 mJ of energy for a PCB-based implementation and 40 𝜇J for CMOS simulation. In other words, LiTEfoot tags can last for 11 years on a coin cell while continuously estimating location every 5 seconds. We believe that LiTEfoot will have widespread implications in city-scale asset tracking and other location-based services. The radio architecture can be useful beyond low-power self-localization and can find application in synchronization and communication on battery-less platforms.
more » « less
Full Text Available
Poster: Wideband Cellular Sensing for Real-time, Sustainable Geo-localization Tags

https://doi.org/10.1145/3666025.3699382

Garg, Nakul; Ghosh, Aritrik; Roy, Nirupam (November 2024, ACM)

This paper presents LiTEfoot, an ultra-low power, wide-area localization system leveraging ambient cellular signals to address the limitations of traditional self-localization systems in terms of power consumption and latency. LiTEfoot uses a non-linear transformation of the cellular synchronization signal to efficiently achieve self-localization by systematically superimposing signals at the baseband. A simple envelope detector is used to realize this non-linear transformation, enabling the identification of multiple active base stations across any cellular band. The system is designed to operate with low power, consuming only 40 𝜇Joules of energy per localization update, achieving a median localization error of 22 meters in urban areas.
more » « less
Full Text Available
Learning Speaker-Listener Mutual Head Orientation by Leveraging HRTF and Voice Directivity on Headphones

https://doi.org/10.1109/ICASSP48485.2024.10446588

Takawale, Harshvardhan; Roy, Nirupam (April 2024, ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP))

Estimation of a speaker’s direction and head orientation with binaural recordings can be a critical piece of information in many real-world applications with emerging ‘earable’ devices, including smart headphones and AR/VR headsets. However, it requires predicting the mutual head orientations of both the speaker and the listener, which is challenging in practice. This paper presents a system for jointly predict- ing speaker-listener head orientations by leveraging inherent human voice directivity and listener’s head-related transfer function (HRTF) as perceived by the ear-mounted microphones on the listener. We propose a convolution neural network model that, given binaural speech recording, can predict the orientation of both speaker and listener with re- spect to the line joining the two. The system builds on the core observation that the recordings from the left and right ears are differentially affected by the voice directivity as well as the HRTF. We also incorporate the fact that voice is more directional at higher frequencies compared to lower frequen- cies. Our proposed system achieves 2.5 degrees of 90th percentile error in the listener’s head orientation and 12.5 degrees of 90th percentile error for that of the speaker.
more » « less
Full Text Available
Scribe: Simultaneous Voice and Handwriting Interface

https://doi.org/10.1145/3631411

Bai, Yang; Shahid, Irtaza; Takawale, Harshvardhan; Roy, Nirupam (December 2023, Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies)

This paper presents the design and implementation of Scribe, a comprehensive voice processing and handwriting interface for voice assistants. Distinct from prior works, Scribe is a precise tracking interface that can co-exist with the voice interface on low sampling rate voice assistants. Scribe can be used for 3D free-form drawing, writing, and motion tracking for gaming. Taking handwriting as a specific application, it can also capture natural strokes and the individualized style of writing while occupying only a single frequency. The core technique includes an accurate acoustic ranging method called Cross Frequency Continuous Wave (CFCW) sonar, enabling voice assistants to use ultrasound as a ranging signal while using the regular microphone system of voice assistants as a receiver. We also design a new optimization algorithm that only requires a single frequency for time difference of arrival. Scribe prototype achieves 73 μm of median error for 1D ranging and 1.4 mm of median error in 3D tracking of an acoustic beacon using the microphone array used in voice assistants. Our implementation of an in-air handwriting interface achieves 94.1% accuracy with automatic handwriting-to-text software, similar to writing on paper (96.6%). At the same time, the error rate of voice-based user authentication only increases from 6.26% to 8.28%.
more » « less
Full Text Available
Revisiting rotationally excited CH at radio wavelengths: A case study towards W51

https://doi.org/10.1051/0004-6361/202449603

Jacob, Arshia M; Nandakumar, Meera; Roy, Nirupam; Menten, Karl M; Neufeld, David A; Faure, Alexandre; Tiwari, Maitraiyee; Pillai, Thushara_G S; Robishaw, Timothy; Durán, Carlos A (December 2024, Astronomy & Astrophysics)

Context.Ever since they were first detected in the interstellar medium, the radio wavelength (3.3 GHz) hyperfine-structure splitting transitions in the rotational ground state of CH were observed to show anomalous excitation. Astonishingly, this behaviour was uniformly observed towards a variety of different sources probing a wide range of physical conditions. While the observed level inversion could be explained globally by a pumping scheme involving collisions, a description of the extent of ‘over-excitation’ observed in individual sources required the inclusion of radiative processes, involving transitions at higher rotational levels. Therefore, a complete description of the excitation mechanism in the CH ground state, observed towards individual sources entails observational constraints from the rotationally excited levels of CH and in particular that of its first rotationally excited state (²Π_3/2,N= 1,J= 3/2). Aims.Given the limited detections of these lines, the objective of this work is to characterise the physical and excitation properties of the rotationally excited lines of CH between the Λ-doublet levels of its²Π_3/2,N= 1,J= 3/2 state near 700 MHz, and investigate their influence on the pumping mechanisms of the ground-state lines of CH. Methods.This work presents the first interferometric search for the rotationally excited lines of CH between the Λ-doublet levels of its²Π_3/2,N= 1,J= 3/2 state near 700 MHz carried out using the upgraded Giant Metrewave Radio Telescope (uGMRT) array towards six star-forming regions, W51 E, Sgr B2 (M), M8, M17, W43, and DR21 Main. Results.We detected the two main hyperfine structure lines within the first rotationally excited state of CH, in absorption towards W51 E. To jointly model the physical and excitation conditions traced by lines from both the ground and first rotationally excited states of CH, we performed non-local thermodynamic equilibrium (LTE) radiative transfer calculations using the code MOLPOP-CEP. These models account for the effects of line overlap and are aided by column density constraints from the far-infrared (FIR) wavelength rotational transitions of CH that connect to the ground state and use collisional rate coefficients for collisions of CH with H, H₂and electrons (the latter was computed in this work using cross-sections estimated within the Born approximation). Conclusions.The non-LTE analysis revealed that physical properties typical of diffuse and translucent clouds best reproduced the higher rates of level inversion seen in the ground-state lines at 3.3 GHz, observed at velocities near 66 km s⁻¹along the sightline towards W51 E. In contrast, the excited lines near 700 MHz were only excited in much denser environments withn_H~ 10⁵cm⁻³towards which the anomalous excitation in two of the three ground state lines is quenched, but not in the 3.264 GHz line. This is in alignment with our observations and suggests that while FIR pumping and line overlap effects are essential for exciting and producing line inversion in the ground state, excitation to the first rotational level is dominated by collisional excitation from the ground state. For the rotationally excited state of CH, the models indicated low excitation temperatures and column densities of 2 × 10¹⁴cm⁻². Furthermore, modelling these lines helps us understand the complexities of the spectral features observed in the 532/536 GHz rotational transitions of CH. These transitions, connecting sub-levels of the first rotationally excited state to the ground state, play a crucial role in trapping FIR radiation and enhancing the degree of inversion seen in the ground state lines. Based on the physical conditions constrained, we predict the potential of detecting hyperfine-splitting transitions arising from higher rotationally excited transitions of CH in the context of their current non-detections.
more » « less
Full Text Available
Frameworks for Student Research Engagement on Interdisciplinary Civic-Engaged Projects

Kweon, Byoung-Suk; Belle, Lauren; Fisher, Kim; Burke, Tara; Haghtalab, Joy; Roy, Nirupam; Bonsignore, Elizabeth; Sachs, Naomi; Roberts, Jennifer. (January 2023, The International Network for the Science of Team Science (INSciTS))

Broadband infrastructure in urban parks may serve crucial functions including an amenity to boost overall park use and a bridge to propagate WiFi access into contiguous neighborhoods. This project: SCC:PG Park WiFi as a BRIDGE to Community Resilience has developed a new model —Build Resilience through the Internet and Digital Greenspace Exposure, leveraging off-the-shelf WiFi technology, novel algorithms, community assets, and local partnerships to lower greenspace WiFi costs. This interdisciplinary work leverages: computer science, information studies, landscape architecture, and public health. Collaboration methodologies and relational definitions across disciplines are still nascent —especially when paired with civic-engaged, applied research. Student researchers (UG/Grad) are excellent partners in bridging disciplinary barriers and constraints. Their capacity to assimilate multiple frameworks has produced refinements to the project’s theoretical lenses and suggested novel socio-technical methodology improvements. Further, they are excellent ambassadors to community partners and stakeholders. In BRIDGE, we tested two mechanisms to augment student research participation. In both, we leveraged a classic, curriculum-based model named the Partnership for Action Learning in Sustainability program (PALS). This campus-wide, community-engaged initiative pairs faculty and students with community partners. PALS curates economic, environmental, and social sustainability challenges and scopes projects to customize appropriate coursework that addresses identified challenges. Outcomes include: literature searches, wireframes, and design plans that target solutions to civic problems. Constraints include the short semester timeframe and curriculum-learning-outcome constraints. (1) On BRIDGE, Dr. Kweon executed a semester-based Landscape Architecture PALS 400-level-studio. 18 undergraduates conducted in-class and in-field work to assess community needs and proposed design solutions for future park-wide WiFi. Research topics included: community-park history, neighborhood demographics, case-study analysis, and land-cover characteristics. The students conducted an in-Park, community engagement session —via interactive posterboard surveys, to gain input on what park amenities might be redesigned or added to promote WiFi use. The students then produced seven re-design plans; one included a café/garden, with an eco-corridor that integrated technology with nature. (2) From the classic, curriculum-based PALS model we created a summer-intensive for our five research assistants, to stimulate interdisciplinary collaboration in their research tasks and co-analysis of project data products: experimental technical WiFi-setup, community survey results, and stakeholder needs-assessments. Students met weekly with each other and team leadership, exchanged journal articles, and attended joint research events. This model shows promise for integrating students more formally into an interdisciplinary research project. An end-of-intensive focus group highlighted, from the students’ perspective, the pro/cons of this model. Results: In contrasting the two mechanisms, our results include: Model 1 is tried-and-trued and produces standardized, reliable products. However, as work is group based, student independence is limited —to explore topics/themes of interest. Civic groups are typically thrilled with the diversity of action plans produced. Model 2 provides greater independence in student-learning outcomes, fosters interdisciplinary, “dictionary-building” that can be used by the full team, deepens methodological approaches, and allows for student stipend payments. Lessons learned: intensive time frame needed more research team support and ideally should be extended, when possible, over the full project-span. UMD-IRB#1785365-4; NSF-award: 2125526.
more » « less
Full Text Available
The magnetic field in the dense photodissociation region of DR 21

https://doi.org/10.1093/mnras/staa3898

Koley, Atanu; Roy, Nirupam; Menten, Karl M; Jacob, Arshia M; Pillai, Thushara G; Rugel, Michael R (January 2021, Monthly Notices of the Royal Astronomical Society)

ABSTRACT Measuring interstellar magnetic fields is extremely important for understanding their role in different evolutionary stages of interstellar clouds and star formation. However, detecting the weak field is observationally challenging. We present measurements of the Zeeman effect in the 1665 and 1667 MHz (18 cm) lines of the hydroxyl radical (OH) lines towards the dense photodissociation region (PDR) associated with the compact H ii region DR 21 (Main). From the OH 18 cm absorption, observed with the Karl G. Jansky Very Large Array, we find that the line-of-sight magnetic field in this region is ∼0.13 mG. The same transitions in maser emission towards the neighbouring DR 21(OH) and W 75S-FR1 regions also exhibit the Zeeman splitting. Along with the OH data, we use [C ii] 158 μm line and hydrogen radio recombination line data to constrain the physical conditions and the kinematics of the region. We find the OH column density to be ∼3.6 × 1016(Tex/25 K) cm−2, and that the 1665 and 1667 MHz absorption lines are originating from the gas where OH and C+ are co-existing in the PDR. Under reasonable assumptions, we find the measured magnetic field strength for the PDR to be lower than the value expected from the commonly discussed density–magnetic field relation while the field strength values estimated from the maser emission are roughly consistent with the same. Finally, we compare the magnetic field energy density with the overall energetics of DR 21’s PDR and find that, in its current evolutionary stage, the magnetic field is not dynamically important.
more » « less
Full Text Available

« Prev Next »